Improved phone recognition on TIMIT using formant frequency data and confidence measures
نویسندگان
چکیده
This paper presents a novel approach to integration of formant frequency and conventional MFCC data in phone recognition experiments on TIMIT. Naive use of format data introduces classification errors if formant frequency estimates are poor, resulting in a net drop in performance. However, by exploiting a measure of confidence in the formant frequency estimates, formant data can contribute to classification in parts of a speech signal where it is reliable, and be replaced by conventional MFCC data when it is not. In this way an improvement of 4.7% is achieved. Moreover, by exploiting the relationship between formant frequencies and vocal tract geometry, simple formant-based vocal tract length normalisation reduces the error rate by 6.1% relative to a conventional representation alone.
منابع مشابه
Speech recognition using non-linear trajectories in a formant-based articulatory layer of a multiple-level segmental HMM
This paper describes how non-linear formant trajectories, based on ‘trajectory HMM’ proposed by Tokuda et al., can be exploited under the framework of multiple-level segmental HMMs. In the resultant model, named a non-linear/linear multiple-level segmental HMM, speech dynamics are modeled as non-linear smooth trajectories in the formant-based intermediate layer. These formant trajectories are m...
متن کاملModelling Speech Signals using Formant Frequencies as an Intermediate Representation
This paper concerns Multiple-level Segmental HiddenMarkov Models (M-SHMMs) in which the relationship between symbolic and acoustic representations of speech is regulated by a formant-based intermediate representation. New TIMIT phone recognition results are presented, confirming that the theoretical upper-bound on performance is achieved provided that either the intermediate representation or t...
متن کاملComparative Experiments to Evaluate Acoustic Distinctive Features and Forma Recognition Using a Multi-s
This paper presents an evaluation of the use of some auditory-based acoustic distinctive features and formant cues for automatic speech recognition (ASR). Comparative experiments have indicated that the use of either the formant magnitudes or the formant frequencies combined with some auditory-based acoustic distinctive features and the classical MFCCs within a multi-stream statistical framewor...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملStatistical Variation Analysis of Formant and Pitch Frequencies in Anger and Happiness Emotional Sentences in Farsi Language
Setup of an emotion recognition or emotional speech recognition system is directly related to how emotion changes the speech features. In this research, the influence of emotion on the anger and happiness was evaluated and the results were compared with the neutral speech. So the pitch frequency and the first three formant frequencies were used. The experimental results showed that there are lo...
متن کامل